Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Deep Policy Inference Q-Network for Multi-Agent Systems

We present DPIQN, a deep policy inference Qnetwork that targets multi-agent systems composed of controllable agents, collaborators, and opponents that interact with each other. We focus on one challenging issue in such systems— modeling agents with varying strategies—and propose to employ “policy features” learned from raw observations (e.g., raw images) of collaborators and opponents by inferr...

متن کامل

Off-policy reinforcement learning for H∞ control design

The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for...

متن کامل

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

This paper proposes an integral Q-learning for continuous-time (CT) linear time-invariant (LTI) systems, which solves a linear quadratic regulation (LQR) problem in real time for a given system and a value function, without knowledge about the system dynamics A and B. Here, Q-learning is referred to as a family of reinforcement learning methods which find the optimal policy by interaction with ...

متن کامل

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ [13], converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would ...

متن کامل

P14: Anxiety Control Using Q-Learning

Anxiety disorders are the most common reasons for referring to specialized clinics. If the response to stress changed, anxiety can be greatly controlled. The most obvious effect of stress occurs on circulatory system especially through sweating. the electrical conductivity of skin or in other words Galvanic Skin Response (GSR) which is dependent on stress level is used; beside this parameter pe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IFAC-PapersOnLine

سال: 2020

ISSN: 2405-8963

DOI: 10.1016/j.ifacol.2020.12.2180